PSYCHOPHYSIOLOGY 
Copynght © 1991 by The Society for Psychophysiological Research, Inc. 


Vol. 28, No. 5 
Printed in U.S.A. 


The Truth Will Out: Interrogative Polygraphy (‘‘Lie 
Detection’’) With Event-Related Brain Potentials 


LAWRENCE A. FARWELL AND EMANUEL DONCHIN 
Cognitive Psychophysiology Laboratory, University of Illinois at Urbana-Champaign 


ABSTRACT 


The feasibility of using Event Related Brain Potentials (ERPs) in Interrogative Polygraphy (“Lie 
Detection”) was tested by examining the effectiveness of the Guilty Knowledge Test designed by 
Farwell and Donchin (1986, 1988). The subject is assigned an arbitrary task requiring discrimination 
between experimenter-designated targets and other, irrelevant stimuli. A group of diagnostic items 
(“probes”), which to the unwitting are indistinguishable from the irrelevant items, are embedded 
among the irrelevant. For subjects who possess “guilty knowledge” these probes are distinct from 
the irrelevants and are likely to elicit a P300, thus revealing their possessing the special knowledge 
that allows them to differentiate the probes from the irrelevants. We report two experiments in 
which this paradigm was tested. In Experiment 1, 20 subjects participated in one of two mock 
espionage scenarios and were tested for their knowledge of both scenarios, All stimuli consisted of 
short phrases presented for 300 ms each at an interstimulus interval of 1550 ms. A set of items were 
designated as “targets” and appeared on 17% of the trials. Probes related to the scenarios also 
appeared on 17% of the trials. The rest of the items were irrelevants. Subjects responded by pressing 
one switch following targets, and the other following irrelevants (and, of course, probes). ERPs were 
recorded from F,, C,, and P,. As predicted, targets elicited large P300s in all subjects. Probes 
associated with a given scenario elicited a P300 in subjects who participated in that scenario. A 
bootstrapping method was used to assess the quality of the decision for each subject. The algorithm 
declared the decision indeterminate in 12.5% of the cases. In all other cases a decision was made. 
There were no false positives and no false negatives: whenever a determination was made it was 
accurate. The second experiment was virtually identical to the first, with identical results, except 
that this time 4 subjects were tested, each of which had a minor brush with the law. Subjects were 
tested to determine whether they possessed information on their own “crimes.” The results were as 
expected; the Guilty Knowledge Test determined correctly which subject possessed which informa- 
tion. The implications of these data both for the practice of Interrogative Polygraphy and the 


interpretation of the P300 are discussed. 


DESCRIPTORS: Event-related potentials, Lie detection, Guilty Knowledge Test, P300. 


We report an examination of the feasibility of 
adopting an approach to Interrogative Polyg- 


The research reported here was supported in part by 
contract number 87F350800 with the Central Intelligence 
Agency. Preliminary reports were presented at the 1986, 
1988, and 1989 meetings of the Society for Psychophys- 
iological Research. We acknowledge with thanks the com- 
ments of Bill Gehring, Greg Miller, and David Lykken 
On earlier versions of this manuscript. The project could 
not have been conducted without the technical support 
of the staff of the Cognitive Psychophysiology Laboratory 
(CPL). Special thanks are due to Brian Foote for his pro- 
gramming, and to Mike Anderson for his technical wiz- 
ardry. Many of our CPL colleagues have helped with the 
design of these experiments and their interpretation as 
they unfolded. Particular thanks are due to Christine Col- 
lins and Gregory Miller. 


raphy' (Farwell & Donchin, 1986, 1988, 1989) that 
bases inferences regarding the degree to which sub- 
jects possess knowledge they are hiding on mea- 


Address request for reprints to: Emanuel Donchin, De- 
partment of Psychology, University of Illinois, 603 East 
Daniel, Champaign, Illinois 61820. 


'We use the term Interrogative Polygraphy to refer to 
a body of techniques that 1s popularly known as “lie de- 
tection,” and commonly referred to by its practitioners as 
“polygraphy.” The popular term is inaccurate and mis- 
leading, because none of the techniques actually detect 
lies (see Ekman, 1985). The term polygraphy is too broad 
if left unqualified. All psychophysiologists use polygraphs. 
This paper is concerned with those psychophysiologists 
who are using polygraphs as aids in interrogations, and 
that is why the term interrogative polygraphy is deemed 
appropriate. 


531 


532 


sures of attributes of various components of the 
event-related brain potential (ERP). This approach 
is based on the premise that some ERP components 
are manifestations, at the scalp, of the activity of 
subsystems that execute specific information-pro- 
cessing tasks (Donchin, 1981; Donchin & Coles, 
1988; Donchin, Ritter, & McCallum, 1978; Hillyard 
& Kutas, 1983). 

Although the use of ANS (autonomic nervous 
system) responses in Interrogative Polygraphy has 
many advocates (Barland & Raskin, 1975; Office of 
Technology Assessment, 1983; Podlesny & Raskin, 
1978: Raskin, 1986; Reid & Inbau, 1977) and the 
technique is widely employed by government agen- 
cies and by private practitioners, it suffers a number 
of inherent difficulties (Ben-Shakhar & Furedy, 
1990; Furedy, 1986; Lykken, 1981). It is worthwhile 
to examine the feasibility of augmenting the tool 
kit available to Interrogative Polygraphy. It should 
be clear, however, that the study reported here rep- 
resents but the first stage in the development of an 
interrogatory technique. Our purpose is to dem- 
onstrate that it is possible, at /east under laboratory 
conditions, to use ERPs in the detection of con- 
cealed information. This demonstration, if persu- 
asive, should lead to further research that will ex- 
amine the efficacy of the procedure under field con- 
ditions. 

The Interrogative Polygraphy procedure we ex- 
amine in this report utilizes the P300 component 
of the ERP in the context of the ““oddball”’ paradigm 
(Fabiani, Gratton, Karis, & Donchin, 1987). The 
subject is presented with a series of events and with 
a classification rule that places each event into one 
of two categories. The classification rule can range 
from the concrete to the abstract (i.e., frequency 
differences of tones, exemplars of two different cat- 
egories). The events can be quite diverse, and the 
categorization may depend on an ensemble of prop- 
erties of the events. The subject must be assigned 
a task that requires the categorization of the events. 
The series is so constructed that events from the 
two categories occur in a random sequence. Fur- 
thermore, one of the categories occurs less frequent- 
ly than the other. When these conditions are sat- 
isfied, the events that belong in the rare category 
elicit a large P300. The amplitude of the P300 is 
inversely proportional to the probability of the elic- 
iting event-category and directly proportional to the 
relevance of the event to the subject’s task (Don- 
chin, Karis, Bashore, Coles, & Gratton, 1986; Dun- 
can-Johnson & Donchin, 1977; Squires, Squires, & 
Hillyard, 1975). 

Thus, the appearance of a P300 in response to 
the events that belong to the rare category in an 
oddball series indicates that the subject has cor- 


Farwell and Donchin 


Vol. 28, No. 5 


rectly categorized them. This feature of the oddball 
paradigm suggests its use in Interrogative Polyg- 
raphy. One constructs a series whose elements 
would appear homogeneous to the innocent, where- 
as for the guilty (who are concealing information) 
they fall into two categories. If the critical items, 
which are distinct from the rest of the series by 
virtue of their association with the crime, elicit a 
P300, then one may conclude that the subject does 
make the discriminations that only the guilty can 
make. It is critical to note that for the P300 to be 
elicited the subject must be assigned some task 
whose performance depends on the processing of 
the events. If a two-category series appears ho- 
mogeneous to the innocent, no task can be assigned 
unless one directs the subject’s attention to the very 
items that are in principle indiscriminable, thus de- 
feating the purpose of the test. Farwell and Donchin 
(1986, 1988, 1989) solved this problem by creating 
an oddball series consisting of three subseries. One 
subseries included stimuli that occurred 17% of the 
time and were defined as “targets” by virtue of their 
inclusion on a list of items that the subject was 
instructed to detect. The remaining stimuli, pre- 
sented on 83% of the trials, were not included in 
the list and were, therefore, defined for the subject 
as “irrelevant.”’ However, some of these irrelevant 
items (17% of the stimuli) were in fact “‘crime-rel- 
evant” or “probe” stimuli. Their relevance, though, 
was known only to the investigator and to the 
“criminal.” Thus, the innocent subject is presented 
with a standard two-category oddball series in 
which the targets would be expected to elicit a P300 
whereas the irrelevant stimuli would not. For the 
guilty, who possess the extra knowledge, the crime- 
relevant probes stand out as a third, rare category. 

This procedure is closely related to the Guilty 
Knowledge Test (GKT) first proposed by Lykken 
(1959; see also Lykken, 1981). The GKT challenges 
the subject with a series of multiple choice ques- 
tions, one of the choices for each question being a 
detail knowledge of which indicates guilt. For ex- 
ample, “Was the getaway car a (1) red ford, (2) yel- 
low toyota, (3) pink honda, (4) grey chevy, or (5) 
white plymouth?” For the innocent none of the al- 
ternatives carries any significance. The robber pre- 
sumably knows that she absconded in a yellow toy- 
ota. Electrodermal responses and, in some cases, 
cardiovascular and pulmonary responses are re- 
corded, and a consistently enhanced response to the 
critical option is taken as an indication of guilt. The 
GKT provides an important control against false 
positives in that the subject’s response to the non- 
critical items serves as a control, providing a base- 
line of the responsiveness to a question implying 
her guilt, even when she does not know which car 


September, 1991 


was used in the robbery. Note, however, that the 
GKT has but a weak control for false negatives. 
Absence of an enhanced response to any option 
cannot be readily evaluated because nothing in the 
procedure forces the subject to respond to any of 
the items. 

There are two noteworthy features of the pro- 
cedure described in this report. By selecting an ar- 
bitrary set of stimuli to be designated as targets, 
which the subject must discriminate, a task is cre- 
ated that focuses the subject’s attention in a manner 
that ensures the elicitation of the P300. However, 
this task is unrelated to the subject of the investi- 
gation. In contrast, when the oddball series is con- 
structed explicitly to call for a discrimination be- 
tween “‘crime-relevant” and “crime-irrelevant” 
items, the likelihood of false positives increases, 
because innocent subjects may process the explic- 
itly relevant items differently from the way they 
would the irrelevant items. __ 

The designation of arbitrary targets makes it 
possible to hide the crime-relevant items among the 
nontarget, frequent, and irrelevant events. For the 
innocent subjects, these items are indistinguishable 
from all other irrelevants, because nothing in the 
procedure draws the subject’s attention to these 
items. For the “guilty” these items stand out among 
the frequent items, because they are associated with 
the crime, given the information the subject pos- 
sesses. It is important to note that the decision as 
to the subject’s possession of the guilty information 
is based on the appearance of differences among the 
frequent, nontarget items, differences that would 
not occur without the guilty knowledge. Thus, our 
procedure incorporates two different controls. The 
targets serve to identify the level of responsiveness 
to be expected from the subject to relevant items, 
guarding against false negatives. The irrelevant 
items serve as a control for the response to items 
that have no bearing on the subject of the inter- 
rogation. The indistinguishability of the probes and 
irrelevants guards against false positives. 

This approach to Interrogative Polygraphy will 
be illustrated by two studies. In the first experiment 
Subjects participated in a mock espionage scenario, 
and the P300 was used to detect guilty knowledge 
regarding the scenario. The subjects were also tested 
in a scenario of which they were “innocent.” That 
a subject participated in a scenario was inferred 
from the fact that he was knowledgeable, as re- 
vealed by the P300, regarding the significant details 
of the scenario. Similarly, “innocence” was dem- 
Onstrated by the subjects’ ignorance of relevant de- 
tails, as revealed by the absence of a P300 in re- 
sponse to the critical events. The second experi- 
ment shows that the technique can be used to detect 


A P300-Based Guilty Knowledge Test 


533 


information about “real world” events. The tech- 
nique was used to detect guilty knowledge regarding 
minor crimes actually committed by each of 4 sub- 
jects, and lack of guilty knowledge regarding acts 
they did not commit. 


EXPERIMENT 1 


Method 
Subjects 


Twenty subjects (12 female), aged 19-27 years, par- 
ticipated in the study. All subjects were undergraduate 
or graduate students who were paid for their partici- 
pation. 


Procedure 


Subjects were trained by an interactive computer 
program to perform one of two different mock espi- 
onage scenarios, which they then proceeded to carry 
out. There were two different scenarios. Each subject 
learned one scenario and was unaware of the existence 
of the other scenario. Thus, each subject could be 
tested on the knowledge of the scenario that he or she 
experienced (this was that subject’s “Guilty” scenario) 
as well as on the scenario of which he was innocent. 
Each of the scenarios required the subjects to go to a 
specific location and meet a person with whom a pass- 
word would be exchanged. The subject then asked that 
person for a file that had a particular designation and 
pertained to a specific operation. With each of the two 
scenarios we associated six critical details, knowledge 
of which could be used as indication that the subject 
participated in the scenario. Appendix A presents the 
information associated with each of the scenarios, as 
well as all the other stimuli used in the study. 

The interactive training program consisted of a se- 
ries of instructions about the critical items that the 
subjects were instructed to memorize and actions they 
were to follow. The instructions were repeated several 
times, and subjects were repeatedly tested on the in- 
structions until they had responded correctly at least 
five times to questions regarding each of six key items. 
Following the training session, when the computer had 
established that the subject had learned the scenario 
to criterion, he or she was instructed to undertake the 
mission. In each case, the appropriate file folders were 
handed to the subject and he or she proceeded to the 
location where the information and files were ex- 
changed with the contact. 

One day after executing the scenario each subject 
underwent an ERP (event-related brain potential) aid- 
ed interrogation designed to test whether the subject 
possessed “guilty” knowledge. Subjects were tested for 
knowledge of each of the two scenarios, the one in 
which they had actually participated and the other 
scenario of which they knew nothing. Subjects were 
tested in three blocks for each of the two scenarios. 
Blocks of Scenario | alternated with blocks of Scenario 
2. Note that Scenario |, of which half the subjects were 
guilty, was tested first for all subjects. Thus, the order 


534 


of testing was counterbalanced across subjects. Two 
guilty knowledge tests were administered, one based 
on ERPs and one on skin conductance as measured 
by a conventional polygraph used in detection of de- 
ception.’ 


The Testing Environment 


Stimuli were presented visually on a CRT under 
computer control, and the ERPs elicited by each stim- 
ulus were recorded and stored on tape for off-line 
analysis. Each stimulus consisted of two one-syllable 
words. Stimulus duration was 300 ms. The interstim- 
ulus interval was 1550 ms. 

The subjects were told that the stimuli would be 
two-word phrases. Some of these phrases were arbi- 
trarily designated as targets, and the subjects’ task was 
to press one of two microswitches whenever they saw 
a target and to press another microswitch when they 
saw an irrelevant item. Although the instructions re- 
ferred to two categories of stimuli, there were in fact 
three categories, because among the nontarget stimuli 
we included the “probes,” which were phrases refer- 
ring to the six critical items associated with each scen- 
ario. 

For each of the two scenarios, subjects were tested 
in three blocks of 144 trials per block. On each trial 
we presented the subject with a two-word phrase on 
the screen. The phrases could be targets, irrelevants, 
or probes. The three categories were presented in a 
random order. The set of targets contained 6 phrases 
to which the subject was instructed to respond by 


2The tests conducted with the conventional polygraph 
will not be discussed in this report. Using conventional 
polygraphic techniques, we were not able to detect the 
subjects’ guilty knowledge. However, the circumstances 
of testing were quite different from those used by profes- 
sional polygraphers, hence we do not believe any conclu- 
sions can be drawn from this phase of our study. We do 
note, however, that the conventional polygraph was not 
used concurrently with the ERP recording. Rather, a GKT 
patterned after Lykken (1981) was administered subse- 
quent to the ERP test. 


Farwell and Donchin 


Vol. 28, No. 5 


pressing one of the switches. Each of the 6 target items 
was repeated 4 times in each sequence, so that the total 
number of target trials was 24, or 17% of the trials. 
The remaining | 20 trials consisted of irrelevants which 
could be derived from one of two stimulus sets. The 
“true irrelevants” included the items that bore no re- 
lationship to either of the scenarios. For each target 
there were 4 similar irrelevants, for a total of 24 unique 
irrelevants. The phrases used as targets and as irrele- 
vants are presented in Appendix A. Each of these items 
was repeated 4 times in the series for a total of 96 
items. The last set of stimuli constituted the probes. 
These were 6 items directly relevant to the scenario 
tested by the sequence. The six probes for each scen- 
ario are also listed in Appendix A. Each probe was 
repeated 4 times, so that there were 24 probe trials. 
Note that for the innocent the series consists of targets 
and irrelevants, with 17% of the former and 83% of 
the latter. For the guilty, 17% of the trials are targets, 
17% are probes, and 66% are irrelevants. The design 
of the test is summarized in Table 1. 

Note that the subject pressed a switch in response 
to every stimulus. One hand was used to respond to 
the targets and the other hand was used to respond to 
the probes and the irrelevant stimuli. Like probes, tar- 
gets were relatively rare, appearing once in every six 
stimull. 

Prior to each block, a list of the target stimuli for 
that block appeared on the screen. The experimenter 
read the list aloud, then the subject read the list aloud, 
and then the subject was asked to recall the list and 
was corrected if any errors or omissions occurred. The 
subject was instructed to press one microswitch fol- 
lowing the presentation of a target stimulus, and an- 
other microswitch following any other stimulus. Sub- 
jects were instructed to press the switch as quickly and 
accurately as possible. The list of target stimuli was 
erased from the screen before the stimuli were pre- 
sented. 

Every 36 trials (that is, following one presentation 
of each stimulus) the stimuli were randomized again, 
and the next 36 trials were presented. This was re- 
peated four times each block, for a total of 144 trials. 





Table 1 
Types of stimuli and predicted ERPs 
Stimulus Relative Stimulus Predicted 
Type Frequency Description Instructions Evaluation ERP 
Target 1/6 Relevant to Right button Rare, Relevant P300 
task (not to press 
crime) 
Irrelevant (frequent) 2/3 Irrelevant to Left button Frequent No P300 
task and crime __s press Irrelevant 
Probe 1/6 Relevant to Left button If innocent: No P300 
crime (not to press (treat Frequent, 
task) like irrelevant _ Irrelevant 
stimuli) (indistinguishable 
from irrelevant 
stimuli) 
If guilty; Rare, P300 


Relevant 


September, 1991 


We required the subjects to press buttons to ensure 
that they would actually pay attention to each stimulus 
and perform the stimulus classification that is a pre- 
requisite for the elicitation of the P300. However, the 
response times to the different types of stimuli may 
differ for guilty and innocent individuals, due to the 
increased task difficulty facing a guilty subject. An in- 
nocent subject can simply press one button for stimuli 
that are familiar, and the other button for all other 
stimuli. A guilty subject must distinguish among three 
types of stimuli, two of which are familiar. He must 
respond differently to stimuli that are familiar because 
of the instructions regarding responses to a subset of 
the stimuli (targets) and stimuli that are familiar be- 
cause of the scenario he has enacted (probes). In ad- 
dition to making the probes particularly salient and 
contributing to the probe P300 amplitude, this may 
lead to slower response times for probes than irrele- 
vants in a guilty subject. However, because reaction 
time can easily be voluntarily manipulated, it is not 
Suitable as a measure of guilt or innocence. In partic- 
ular, the lack of a slower reaction time to probes may 
easily be produced by a shift in ‘strategy, and thus is 
not an indication of innocence. 


Data Acquisition 


The electroencephalogram (EEG) was recorded us- 
ing Ag/AgCl Beckman Biopotential electrodes placed 
at the F, (frontal), Cz (central), and P, (parietal) sites 
(10-20 International system), and the right mastoid. 
All sites were referred to the left mastoid. In off-line 
analysis, half of the right mastoid/left mastoid signal 
was subtracted from each channel, so that the reference 
was in effect the average of the mastoids. Electro-ocu- 
logram (EOG) was recorded from sub- and supraor- 
bital electrodes (above and below the right eye). The 
Subjects were grounded at the forehead. Electrode 
impedance did not exceed 5 Kohm. Brain electrical 
activity was amplified by Van Gogh amplifiers with 
low- and high-pass filters set at half-amplitude fre- 
quencies of 35 and 0.02 Hz, respectively. These signals 
were digitized at a rate of 100 samples per second. 
ERPs and reaction times were recorded on tape for 
off-line analysis. 

Prior to data analysis, all data were digitally filtered 
using a 49-point, equal-ripple, zero-phase-shift, opti- 
mal finite impulse response low-pass filter with a pass- 
band cutoff frequency of 6 Hz and a stopband cutoff 
frequency of 8 Hz. (For a discussion of digital filtering 
“ ERPs, see Farwell, Martinerie, Bashore, & Rapp, 

91.) 

All trials, including those with the EOG artifact, 
were recorded, and data from all trials were included 
in the reaction time results. However, only those trials 
with a range of EOG activity of less than 97.7 nV were 
included in the ERP analysis and in the trial counts 
that determined the number of trials presented. 


Results 


Event-Related Brain Potentials 


The average ERP responses for artifact-free trials 
of each trial type at the P, electrode site for each 


A P300-Based Guilty Knowledge Test 


535 


of the 20 subjects in the guilty condition are dis- 
played in Figure la. ERPs for the same subjects in 
the innocent condition are displayed in Figure 1b. 
The responses were as predicted. As can be seen in 
the figure, a large P300 was elicited by the target 
stimuli, but not by the irrelevant stimuli. The 
probes elicited a P300 in most subjects when they 
were relevant to the subject's “crime.” A very small 
P300, if any, was elicited by the probes when the 
subject was “innocent.” 


Data Analysis 


Our task in this study is to assess the similarity, 
for each subject, between the probe ERP and the 
ERP elicited by the other two stimuli. Furthermore, 
it was necessary to employ a method of analysis 
that would give a statistical confidence for each in- 
dividual determination of guilt or innocence. How- 
ever, in order to increase the signal-to-noise ratio 
to a workable level, it was necessary to collapse all 
of the trials of each type for an individual case to 
one average—and thus to eliminate any information 
we had on the distribution of ERP responses within 
an individual case. Moreover, any parametric es- 
timate of the moments or distribution of correla- 
tions would not be valid, because the distribution 
of correlations violates the assumption of normal- 
ity. 

The statistical technique of bootstrapping (Ef- 
ron, 1979; Wasserman & Bockenholt, 1989) pro- 
vides one solution to this problem. To evaluate the 
significance of the apparent differences in Figure 1, 
we compared the three trial types using an iterative 
sampling bootstrapping procedure. Bootstrapping 
provides an estimate of the sampling distribution 
of a parameter when only a limited number of sam- 
ples are available by obtaining many random sub- 
samples from the available data and computing the 
parameter afresh for each of these subsamples. The 
distribution of these values approximates the actual 
distribution. 

We used bootstrapping to estimate the sampling 
distribution of two correlations: the correlation be- 
tween the average of the probe trials and the average 
of the irrelevant trials, and the correlation between 
the probe average and the target average. In our 
computations we used “double-centered” correla- 
tions (i.e., the grand mean for all trials of all types 
was subtracted from the probe, target, and irrele- 
vant average waveforms prior to the correlation 
computations). If the correlation between the probe 
and target trials is significantly greater than the cor- 
relation between the probe and irrelevant trials, 
then we can conclude that the probe ERP responses 
are more similar to the target ERP responses (in 
which a P300 is present) than to the irrelevant ERP 


Vol. 28, No. 5 


Farwell and Donchin 


536 


"Guilty" 


Target Frooé.-----— 


Irrelevant ap eesscaipana 


P300 at Pz 





0300 900 0300 900 0300 900 


0300 900 


msec 
Figure 1a. ERPs for each of 20 subjects in the “guilty” condition (Experiment 1). Note that 
the probe waveform is clearly distinguishable from the irrelevant waveform, and similar to the 


target waveform. 


537 


A P300-Based Guilty Knowledge Test 


September, 199] 


"Innocent" 


Irrelevant ~ 


Probe ------- 


Target 


P300 at Pz 





Subject 5 





0 300 900 


0300 900 
msec 


Figure 1b. ERPs for each of 20 subjects in the “innocent” condition (Experiment 1). (Subjects 
are the same as in Figure la.) Note that the probe waveform is similar to the irrelevant waveform 


0300 900 


0300 900 


ee ee ee ee 


538 


responses (in which there is no P300). If this is the 
case, then we can conclude that the subject recog- 
nizes the probes as a separate, rare category—that 
is, of crime-relevant events—and therefore that the 
subject is “guilty.”’ Similarly, if the correlation be- 
tween the probe and irrelevant trials is greater than 
the correlation between the probe and target trials, 
then we can conclude that the subject is “innocent.” 

The procedure was as follows. We averaged each 
four irrelevant trials, so we had an equal number 
of probe, target, and irrelevant trials, approximately 
72 of each (24 from each of three blocks). We cre- 
ated 100 random samples. The samples were taken 
with replacement. In each of 100 iterations we se- 
lected 72 of each of the three types of trials, yielding 
for each iteration three average ERPs. Each average 
was based on the 72 epochs selected for that iter- 
ation from one of the three trial types. We computed 
the probe-target and probe-irrelevant correlations 
for each iteration. Thus, the process yielded two 
groups of 100 correlations each. The distribution of 
these 100 correlations served as an estimate of the 
sampling distributions of the correlations. We then 
compared the distributions of the probe-irrelevant 
and probe-target correlations. 

For each subject we counted the number of it- 
erations on which the probe/target correlation ex- 
ceeded the probe/irrelevant correlation. This value 
is called the “bootstrap index” in the following dis- 
cussion. In Figure 2 we show the distribution of the 
bootstrap index for the 40 tests we conducted. The 


Bootstrap Index 
for "Guilty" and "Innocent" Conditions 
14 , 


"Guilty” Subject 
1 “Innocent” Subject 


— 
Nm 


— 
oO 


Number of Subjects 





“10-20 30 40 50°60 70 80 90 100 
Bootstrap Index 


Figure 2. The distribution of the bootstrap statistic for 
all 40 tests conducted in Experiment |. Dark bars indicate 
the number of subjects who were “guilty” and were as- 
signed a given bootstrap value. Light bars show the same 
data for the “innocent” subjects. 


Farwell and Donchin 


Vol. 28, No. 5 


number of tests, labeled for guilt and innocence, 
corresponding to each index is plotted in the figure. 
It is evident that most guilty tests are associated 
with the lower values of the bootstrap index, where- 
as the higher values of the index are associated with 
tests in which the subjects did not possess the con- 
cealed knowledge. Five tests, two of the guilty and 
three of the innocent, fall in the middle of the range. 

A decision regarding the guilt or innocence of a 
given subject depends on the extent to which the 
bootstrap index exceeds a criterion. Thus for ex- 
ample, we can decide to require that at least 90% 
of the iterations will declare the subject as guilty 
before guilt is accepted (bootstrap index of 0.10). 
A corresponding low limit on the index can be set, 
which if passed, the subject will be declared inno- 
cent. The data in Figure 2 suggest that we have 
considerable leeway in setting the criteria. This 
point is made also in Figure 3, which plots the out- 
comes of all possible decision rules for all the tests. 
It is clear that any “guilty” criterion that is greater 
than 0.06 and less than 0.36 will correctly identify 
18 of the 20 subjects as guilty and will not mis- 
classify any innocents as guilty. If we set the in- 
nocent criterion at any value greater than 0.47 and 
less than 0.80, 17 of the innocent subjects and none 
of the guilty subjects will be considered innocent. 
If we declare all subjects whose index falls between 
these two limits “indeterminate,” we will find that 
5 of the tests lead to indeterminate results whereas 
the remaining 35 tests lead to correct classification 
of the subjects. In no case, given the indeterminate 
class, do we have either a false positive or a false 
negative. 

The results of the bootstrapping analysis for one 
set of criteria are tabulated in Table 2. Table 2A 
summarizes the accuracy of determinations. Again, 
we have considerable leeway in setting the criteria 
while maintaining high accuracy of classification. 
Except for these 5 subjects, whose results are neither 
strongly “innocent” nor strongly “guilty,” all of the 
guilty subjects have scores of .06 or less and all of 
the innocent subjects have scores of .80 or more. 
Tables 2B and 2C list the determinations for each 
of the guilty and innocent cases respectively. These 
tables also tabulate the bootstrap index, the pro- 
portions of iterations of the bootstrap procedure in 
which the probe-target correlation was greater than 
the probe-irrelevant correlation (i.e., the statistical 
confidence for each determination). Note that a 
high bootstrap index is an indication of “‘inno- 
cence” and a low bootstrap index is an indication 
of “guilt.” 


Scalp Distribution 


In addition to the data for the parietal (P;) elec- 
trode site illustrated in Figure 1, we recorded data 


September, 199] 


A P300-Based Guilty Knowledge Test 


539 


Classification Accuracy 


as a Function of Criterion 


"Guilty" Criterion "Innocent" Criterion 


20 


15 


Number of Subjects 


Bootstrap Index 


—=—— Hits 
sveeres §Faise Alarms 





95 90 80 70 60 50 40 30 20 10 § 


20 





DereOreeOeseOeree 









95 90 80 70 60 50 40 30 20 10 5 
Bootstrap Index 


—F— Correct Reject. 
eeecCeee Misses 


Figure 3. Accuracy of the guilty/innocent classification as a function of the bootstrap statistic 
used for determining guilt or innocence. A “hit” is a guilty subject classified as guilty; a “correct 
rejection” is an innocent subject classified as innocent; a “false alarm” is an innocent subject 
classified as guilty; and a “miss” is a guilty subject classified as innocent. 


Table 2 
2A: ACCURACY OF DETERMINATIONS 

















Subject State 
Decision Guilty Innocent Total 
Guilty 18 0 18 
Innocent 0 17 17 
Indeterminate 2 3 5 
Total 20 20 40 
Predictive Values 
Positive Negative 
100% 100% 
Validity (excluding inconclusives) 100% 
Validity (including inconclusives) 87.5% 





Table 2A: Accuracy of determinations. In the 87.5% of the cases 
where a determination was made, 100% of the determinations 
were accurate. Positive and negative predictive values reflect the 
probability that guilty and innocent subjects, respectively, will be 
correctly determined, when a determination is made (i.¢., exclud- 
ing indeterminates). Validity reflects the overall probability of cor- 
rectly determining a subject's state. 


Decision rule: 


Bootstrap statistic < .10 — = > Guilty 
Bootstrap statistic << .70 = = > Innocent 
Bootstrap statistic > = .10 and = < .70 = => Indeterminate 


from the midline frontal (F,) and central (C,) sites. 
Subjects exhibited the usual parieto-central scalp 
distribution for the P300 in those conditions in 
which a P300 was present (i.e., in response to target 
Stimuli in both the “innocent” and “guilty” con- 
ditions and to probe stimuli in the “guilty” con- 
dition). In most subjects, P300s showed maximum 


amplitude at Pz, with a simultaneous smaller pos- 
itive deflection at Cz, and a still smaller one at F,. 
Three out of 20 subjects exhibited a C,-maximal 
P300, with a slightly smaller positive deflection at 
P, than at Cz. Such a distribution across subjects 
of P300 scalp distributions is typical (Fabiani et al., 
1987). 

We performed the bootstrapping procedure us- 
ing data from all three electrode sites in order to 
see whether or not the additional information pro- 
vided by scalp distribution could contribute to in- 
creased accuracy of determinations of “innocence” 
or “guilt.” We found that, due to greater variability 
in P300 amplitude and shape at F, and C, than at 
P,, including these additional channels in our boot- 
strapping analysis did not improve our ability to 
make accurate determinations. Collecting and vi- 
sually inspecting the F, and C, data, however, did 
serve a useful purpose: Finding a scalp distribution 
typical of P300 made it more clear that the com- 
ponent we quantified at P, was indeed the P300. It 
is possible that some of the discriminating power 
was contributed by components other than the 
P300. From a practical viewpoint this is not an 
issue. As long as the system yields a valid decision 
regarding the subject's standing, the percent of the 
discriminating power contributed by different com- 
ponents is not a major issue at this stage of the 
development of the technique. 


Overt Response Measures 


Mean button-press response times for all trials 
(including EOG-contaminated trials) for each sub- 


540 


Table2B&C 
2B: GUILTY CONDITION 


Statistical 
Subject # Determination Confidence 
l Guilty OO 
2 Guilty O1 
3 Guilty .02 
4 Guilty OO 
5 Guilty 00 
6 Guilty .06 
7 Guilty OO 
8 Guilty 00 
9 Guilty 00 
10 Guilty 00 
1] Guilty .0O 
12 Guilty .03 
13 Guilty .O1 
14 Guilty .0O 
15 Guilty 02 
16 Indeterminate 45 
17 Guilty .03 
18 Guilty 00 
19 Guilty .0O0 


20 Indeterminate 43 


2C: INNOCENT CONDITION 


Statistical 
Subject # Determination Confidence 
l Innocent .80 
2 Innocent 95 
3 Indeterminate 47 
4 Innocent 1.00 
5 Innocent 1.00 
6 Innocent .98 
7 Innocent .93 
8 Indeterminate 36 
9 Innocent 1.00 
10 Innocent .99 
11 Innocent .96 
12 Innocent 2 
13 Innocent 1.00 
14 Innocent .83 
15 Innocent 1.00 
16 Innocent .96 
17 Innocent 1.00 
18 Indeterminate 46 
19 Innocent 1.00 
20 Innocent 1.00 


Tables 2B & 2C: Determinations and statistical confi- 
dence. Bootstrap statistic is the proportion of iterations 
(out of 100) where the correlation between the probe and 
irrelevant waveforms at P, was greater than the correla- 
tion between the probe and target waveforms. Note that 
a higher value indicates “innocence” and a lower value 
indicates “guilt.” 


Farwell and Donchin 


Vol. 28, No. 5 


ject in each condition are presented in Table 3. The 
response times are as predicted. Probe response 
times tend to be slower than irrelevant response 
times in the guilty condition, but not in the inno- 
cent condition. Also, when a given subject is guilty, 
the button presses in response to target stimuli also 
tend to be slower and less accurate than when the 
same subject is innocent. However, as mentioned 
above, because reaction time may be easily manip- 
ulated, it is not suitable as a measure of the presence 
of knowledge. 


Table 3 
Mean reaction times to the probe, irrelevant, and target 
stimuli for each subject in each condition 


Reaction Times (ms) 





Subject # Target Probe Irrelevant 
Innocent Condition 
1 918 775 775 
2 1001 927 933 
3 919 792 775 
4 900 750 765 
5 839 746 751 
6 838 759 744 
7 948 820 853 
8 982 829 776 
9 887 749 768 
10 982 805 806 
11 904 736 744 
12 951 839 838 
13 892 756 780 
14 855 747 744 
15 928 742 745 
16 915 813 827 
17 874 744 743 
18 948 816 813 
19 887 825 832 
20 861 764 776 
Averages 911 786 789 
Guilty Condition 

l 94] 880 810 
2 1061 1087 936 
3 999 1031 880 
4 937 877 762 
5 863 784 730 
6 906 808 762 
7 965 911 835 
8 906 847 773 
9 1017 872 781 
10 1146 1111 905 
11 950 774 759 
12 972 890 826 
13 1033 991 842 
14 871 767 739 
15 977 869 779 
16 944 869 831 
17 920 792 769 
18 965 833 792 
19 927 933 850 
20 857 785 774 


Averages 957 885 806 


September, 199] 


Discussion 


The results confirm our prediction that the P300 
can be used to identify those subjects who were 
familiar with the tested scenario. Inspection of the 
averages obtained from each subject indicates that 
the predicted pattern was obtained from virtually 
all subjects. Yet, it is important to avoid reliance 
on the gross waveforms when decisions are made 
with regard to individuals, decisions that may have 
serious consequences for the individual. We believe 
that such decisions should take into account the 
inherent variability of the data. The decision rules 
we base on the bootstrapping procedures adopt a 
conservative approach. It is gratifying to note that 
in no case did the bootstrapping analysis lead to an 
erroneous decision. That is, we were led neither to 
false positives nor to false negatives by the analysis. 
Instead, the analysis recognized that it did not have 
adequate information in 12.5% of the cases. 

The results of this successful test may be viewed 
with some skepticism by those who believe that 
laboratory experiments using mock crimes do not 
provide an adequate test of an interrogative pro- 
cedure. We agree that implementation of the ideas 
embodied in our procedure require extensive test- 
ing in “realistic” settings. We can, however, offer 
here a test conducted in an admittedly nonstressful 
setting, which did, nevertheless, interrogate subjects 
about “real life’ transgressions of which they were 
definitely guilty. This data set is also of interest 
because it examines the efficacy of the P300-based 
Guilty Knowledge Test in circumstances in which 
the concealed knowledge derives from incidents 
that occurred at intervals ranging from weeks to 
months before the P300 test was conducted. Al- 
though it is true that the subjects’ memory of the 
incidents was refreshed by the discussion of the in- 
cidents when they were recruited, the results do 
extend the scope of this feasibility test. 


EXPERIMENT 2 


The results of Experiment 1 clearly show the 
effectiveness of this paradigm in detecting guilty 
knowledge regarding a mock crime. The purpose of 
Experiment 2° was to examine the feasibility of the 
System in detecting guilty knowledge regarding ac- 
tual crimes, which were not committed as a part of 





‘Experiment 2 was, in fact, our initial attempt to val- 
idate the concept embedded in the P300-based Interro- 
gative Polygraphy method described in this paper. It was 
the success of our procedure in the context of Experiment 
2 that led to our undertaking the large scale validation 
Project described here as Experiment 1. 


A P300-Based Guilty Knowledge Test 


541 


a laboratory study and may have taken place a con- 
siderable time prior to the testing situation. We 
tested 4 undergraduates who admitted having par- 
ticipated in minor crimes or socially undesirable 
activities (e.g., being arrested for underage drink- 
ing). 

Method 


Subjects 


Four undergraduates at the University of Illinois 
served as subjects. The students were recruited by our 
advertising (through word of mouth) for subjects who 
had committed minor crimes or transgressions. The 
subjects were responsible for four transgressions, each 
being guilty of one and innocent of the other three. 


Procedure 


The experimental design was essentially the same 
as for Experiment 1, except for the modifications de- 
scribed below. The stimuli were presented visually, 
each stimulus a two-word phrase ranging from two to 
six syllables total. The probe stimuli were items rele- 
vant to the crime in question (e.g., the place where the 
crime took place or the name of another person in- 
volved). For each of the six probe stimuli there were 
one target and four irrelevant stimuli, as in Experiment 
|. The target and irrelevant stimuli corresponding to 
each of the probe stimuli were items of the same type 
(e.g., a location where the crime did not take place, a 
fictitious name). Thus, the probe and irrelevant items 
were indistinguishable except to the guilty person. 
There was no training session or mock crime, because 
the test focused on an actual crime that had already 
taken place. Instead of pressing a button in response 
to target items as in Experiment 1, subjects were in- 
structed to count the target items and to ignore the 
probe and irrelevant items. Subjects were asked for 
their tally at the end of each block of trials.* The target 
items were displayed at the bottom of the CRT 
throughout each block of the testing session as a mem- 
ory aid. 

Each subject was tested on his or her own crime 
(“guilty condition), and also on another crime about 
which he or she knew nothing (“innocent”™ condition). 
The stimuli for the innocent condition for each subject 
consisted of the stimuli relevant to another subject's 
crime. 


Results 


The waveforms for each of the subjects in the 
“innocent” and “guilty” conditions are presented 
in Figure 4. Results are as predicted. It can be seen 
from the figure that there is a large P300 in response 


“Note that this counting task may not be as effective 
as the button press task in ensuring that subjects actually 
attend to and classify each stimulus. The button press was 
an innovation that was introduced after these data had 
been collected (see Farwell & Donchin, 1986, 1988, 1989), 


542 
“Guilty” 


— Target 
-_-—o Probe 
++ Irrelevant 








Time (msec) 


Farwell and Donchin 





Vol. 28, No. 5 


Subject 1 


Subject 2 


Subject 3 


Subject 4 


Figure 4. ERPs for 4 subjects in the “innocent” and “guilty” conditions (Exper- 


iment 2). 


to the targets for all subjects in both conditions, 
and a very small P300, if any, in response to the 
irrelevants. In the “guilty” condition, all subjects 
show a P300 to the probe stimuli similar to their 
P300 to the targets. In the “innocent” condition the 
probe responses are similar to the irrelevants, and 
do not contain a large P300. 

As in Experiment |, we employed bootstrapping 
to quantify the differences that can be seen in Figure 
4. The results are displayed in Table 4. 

As in Experiment |, the system proved highly 
reliable in distinguishing between the presence and 
the absence of guilty knowledge. The accuracy of 
determinations was the same in Experiment 2 as in 
Experiment 1: 100% correct in the cases in which 
a determination was made, with 12.5% indetermi- 
nate. All of the determinations, both “innocent” 
and “guilty,” were made with a very high statistical 
confidence. 


GENERAL DISCUSSION 


The studies reported here were designed as a test 
of the feasibility of using event-related brain po- 
tentials (ERPs) in Interrogative Polygraphy (com- 
monly referred to as “lie detection’). The data con- 
firm the feasibility of designing a guilty knowledge 
test using the amplitude of the P300 component of 
the ERP as the measure used to detect the guilty 


knowledge. It is important to examine the logic of 
guilty knowledge tests so that the implications of 
our results for Interrogative Polygraphy can be 
placed in perspective. The conventional ANS-based 
Guilty Knowledge Test (GKT) was developed by 
Lykken (1959, 1960) as a procedure that was de- 
signed to circumvent the ambiguities associated 
with the control question technique. Lykken’s pro- 
cedure presents the subject with explicit questions 
that are directed to the crime being investigated. 
The test is a multiple choice test and the subject is 
provided with several equally plausible answers one 
of which is the crime-relevant test item. 

The key assumption of the GKT is that there 1s 
some information about the episode that is known 
only to the investigator and to people who did par- 
ticipate in the episode. This information is the 
“guilty knowledge.” Any implementation of the 
GKT sets up conditions in which the subject 1s pre- 
sented with a sequence of stimuli, among which are 
stimuli that are distinct from all other stimuli by 
virtue of the guilty knowledge. The tests work if the 
distinctiveness of the items reflecting guilty knowl- 
edge is associated with a differential response in the 
bodily system whose activity is being monitored by 
the polygrapher. 

As Furedy (1986) reminds us, the GKT does not 
detect deception per se. That is, the technique does 


September, 199] 


Table 4 
4A: ACCURACY OF DETERMINATIONS 


ST 




















Subject State 
Decision Guilty Innocent Total 
Guilty 4 0 4 
Innocent 0 3 3 
Indeterminate __. 0 l l 
Total 4 4 8 
Predictive Values 
Positive Negative 
100% 100% 
Validity (excluding inconclusives) 100% 
Validity (including inconclusives) 87.5% 


Table 4A: Accuracy of determinations. In the 87.5% of the cases 
where a determination was made, 100% of the determinations 
were accurate. Positive and negative predictive values reflect the 
probability that guilty and innocent subjects, respectively, will be 
correctly determined, when a determination is made (i.c., exclud- 
ing indeterminates). Validity reflects the overall probability of cor- 
rectly determining a subject's state. 





Decision rule: 


Bootstrap statistic < .10 — = > Guilty 
Bootstrap statistic > .70 = = > Innocent 
Bootstrap statistic > = .10 and = < .70 = —> Indeterminate 





4B: GUILTY CONDITION 





Statistical 
Subject # Determination Confidence 
I Guilty 07 
2 Guilty .03 
3 Guilty 01 
4 Guilty .02 





4C: INNOCENT CONDITION 





Statistical 
Subject # Determination Confidence 
l Innocent .99 
2 Indeterminate 27 
3 Innocent .96 
4 Innocent 1.00 





Tables 4B & 4C: Determinations and statistical confidence. Boot- 
Strap statistic is the proportion of iterations (out of 100) where 
the correlation between the probe and irrelevant waveforms at P, 
was greater than the correlation between the probe and target 
waveforms. Note that a higher value indicates “innocence” and 
a lower value indicates “guilt.” 


not directly assess the truth value of the subject’s 
assertion. In that, it is similar to all other methods 
of Interrogative Polygraphy. Instead, as Lykken 
(1974) points out, the “basic assumption of the 
guilty knowledge test is that the guilty subject will 
Show stronger autonomic response to what he rec- 
ognizes as the significant alternative than he would 
have shown without such guilty knowledge” (pp. 
727-728). Lykken goes on to explicitly attribute the 
distinctive response to an “orienting reflex” elicited 
by the “correct” alternative. (For a detailed dis- 


A P300-Based Guilty Knowledge Test 


543 


cussion of the relationship between P300 and the 
Orienting Reflex, see Donchin & Fabiani, in press; 
Donchin & Heffley, 1978; and Siddle & Packer, in 
press. A comprehensive review of ANS approaches 
to Interrogative Polygraphy is provided by Furedy, 
1986.) 
The thesis argued in this report is that it is pos- 
sible to employ the logic of the GKT without relying 
on the activation of autonomic responses. The sys- 
tem presented here does not rely on the elicitation 
of such responses by these “correct alternatives,” 
to use Lykken’s terminology, but rather takes ad- 
vantage of the fact that these alternatives would be 
the only stimuli used in the test whose presentation 
will activate a cognitive processing subsystem re- 
vealed by the appearance of the P300. 

The P300, recorded within the oddball para- 
digm, is an obvious candidate for implementing a 
GKT, because in this paradigm the subject is called 
upon to discriminate between two categories into 
which the individual members of the series of stim- 
uli can be classified. When one of the categories is 
rare its members will elicit the P300. The actual 
application of this paradigm in Interrogative Po- 
lygraphy presents a number of challenges (see, for 
example, Rosenfeld, Nasman, Whalen, Cantwell, & 
Mazzeri, 1987, and Rosenfeld, Angell, Johnson, & 
Qian, 1991). 

The procedure reported by Farwell and Donchin 
(1986), which is examined in detail in the studies 
reported here, modified the oddball paradigm in a 
manner that provides the necessary control con- 
ditions and allows us to focus the subjects’ attention 
on the stimuli without at the same time biasing 
them to be concerned with the crime relevance of 
the test items. We do so by assigning the subjects 
an arbitrary task which, although having nothing 
to do with the crime and the investigation, requires 
them to carefully monitor a series of events for the 
occurrence of targets. These targets have nothing to 
do with the issue at hand, yet they become relevant 
by virtue of the instructions the subjects receive. 
Because these targets occur rarely, they do elicit a 
markedly large P300 and thus provide a baseline 
measure of P300 amplitude that can be elicited 
from the particular individual, on the particular oc- 
casion. The amplitude and shape of these P300s 
serve as a yardstick against which the P300s elicited 
by the “correct alternatives,’ as Lykken called the 
“guilty knowledge” items, can be evaluated. 

These correct alternatives are hidden among the 
irrelevant items, and for the innocent they are in- 
discriminable from these items. The subjects who 
possess the guilty knowledge distinguish these 
“probes” from the other irrelevant items. Hence, 
the elicitation of a P300 by these probes is prima 


544 


facie evidence that the subject possesses the guilty 
knowledge. This procedure combines the virtues of 
two of the major forms of conventional Interro- 
gative Polygraphy procedures, the Guilty Knowl- 
edge Test (GKT) and the Control Question Tech- 
nique (CQT). As in conventional GKTs, we utilize 
crime-relevant stimuli which to the innocent are 
indistinguishable from the irrelevant stimuli. This 
provides for resistance to false positives. However, 
the conventional GKT provides no control items 
to which a response is virtually guaranteed to serve 
as a criterion for evaluating the response to relevant 
items. These control questions are needed to guard 
against an excessive number of false negatives (Of- 
fice of Technology Assessment, 1983). This defi- 
ciency of conventional GKT is avoided here Dy in- 
troducing the target stimuli, which serve as control 
items. 

These data have both specific and broad impli- 
cations for Interrogative Polygraphy. It is evident 
that it is possible to devise procedures that utilize 
event-related brain potentials in the aid of inter- 
rogations. It remains to be seen whether this par- 
ticular implementation of the GKT can be applied 
within field settings. There is a long-standing debate 
(see Furedy, 1986) in this area of study between 
those who believe that the effectiveness of Inter- 
rogative Polygraphy cannot be assessed in labora- 
tory conditions and those who assume that the lab- 
oratory provides an effective test bed for such tech- 
niques. Those who doubt the value of laboratory 
tests tend to invoke the lack of genuine stress in 
the laboratory as the rationale for their skepticism. 
It seems, though, that this issue is one to be resolved 
by empirical investigation. The utility of P300 in 
real life settings can be determined only by a test 
conducted in such settings. Yet, we note that be- 
cause the P300 is used here as an index of a cog- 
nitive rather than an affective activity, it is consid- 
erably less reasonable to discount laboratory dem- 
onstrations. Thus far there has been little evidence 
that P300 can be modulated by affective variables, 
except that the more relevant the stimuli are to the 
subject’s task, and the more relevant the task is to 
the subject, the larger the P300 (see Johnson, 1986). 
It would seem, therefore, that as the overall signif- 
icance of the test increases in real life interrogations, 
the technique’s effectiveness will increase rather 
than decrease. It is necessary to emphasize that the 
demonstration reported in this paper does not con- 
stitute evidence that a P300-based GKT will work 
under all circumstances with the effectiveness 
achieved in our test. The conditions of the test were 
clearly very different from those found in actual 
investigations. Neither the motivation nor the level 


Farwell and Donchin 


Vol, 28, No. 5 


of involvement of our subjects was close to that 
experienced by real suspects. Much additional re- 
search is required if the concept we have outlined 
is to be implemented.° 

The P300 is but one of several endogenous com- 
ponents of the event-related brain potential (ERP) 
that may be used in Interrogative Polygraphy. The 
vocabulary of ERP components is quite varied, and 
one assumes that the current list of well studied 
components is not exhaustive. Thus, there are a 
number of negative components (Hillyard, 1984, 
Naddtanen & Picton, 1987) appearing within the first 
150 ms after a stimulus, which are sensitive to 
changes in the directionality of the subjects’ atten- 
tion and to the occurrence of various mismatches 
between the expected and the obtained stimuli. Ku- 
tas and her coworkers (Kutas & Hillyard, 1980; Ku- 
tas & Van Petten, 1988) have described a compo- 
nent labeled N400, which is affected by the degree 
to which a word violates the linguistic constraints 
imposed by the context in which the word was pre- 
sented. There are a number of event-preceding neg- 
ativities, such as the Readiness Potential and the 
Contingent Negative Variation, which reflect both 
cognitive and motor preparatory processes (Rohr- 
baugh & Gaillard, 1983; Walter, Cooper, Aldridge, 
McCallum, & Winter, 1964). 

Each of these ERP components can be mar- 
shalled in the service of appropriately designed In- 
terrogative Polygraphy procedures. It is critical to 
understand, however, when designing such proce- 
dures it would be an unwise strategy to launch a 
search for what has often been called The Pinocchio 
Response. That is, it is very unlikely that any ERP 
component, or any feature of the EEG, will serve 
as a specific and unique indicator that the subject 
has lied, or was in any other way deceptive. 

It seems prudent to assume that the psycho- 
physiological measures will not, by themselves, pro- 
vide the hoped-for specific and unique indicators 
of deception per se. A more likely strategy for the 
design of an Interrogative Polygraphy method be- 
gins with a comprehensive understanding of the 
psychophysiological foundations of the measures 


‘A reviewer of an early version of this paper has ex- 
pressed a common perception among those who practice 
conventional polygraphy that the use of ERPs requires 
the use of “costly, cumbersome and complex equipment.” 
This is simply not the case. A fully functional P300-based 
testing device can be implemented in a device not larger 
than a standard polygraph, and the attachments to the 
subjects need be no more complex than those used for 
measuring the skin conductance response, and certainly 
less annoying than the conventional blood pressure cuff. 


September, 199] 


One is planning to use. One must begin with an 
analysis of what is known of the antecedent con- 
ditions for a component (Donchin & Coles, 1988) 
and the functional significance of the component. 
These will provide clues regarding the processing 
subsystems that might be manifested by the com- 
ponent. This psychophysiological database needs to 
be interfaced with the interrogatory task. We con- 
ceive of the psychophysiological database as a core, 
around which the designer constructs a shell. The 
shell capitalizes on the nature of the component and 
structures a scenario in which stimuli are presented 
to the subject in such a manner that the ERPs they 
elicit can be interpreted unequivocally in the con- 
text of the interrogation. 

The three-category oddball paradigm, which is 
the subject of this study, is an example of such a 
Shell. It is one of many possible arrangements of 
Stimuli that would elicit a P300. In other words, 
One can imagine numerous shells that attempt to 
place our knowledge of the P300 in the service of 
some application. However, not all shells are equal- 
ly effective. Hiding the probes among the irrele- 
vants yields a very effective shell. Using the crime- 
relevant items as the only rare category in an odd- 
ball paradigm is a very poor shell. Similar consid- 
erations apply to each of the ERP components 
enumerated above. Knowing their functional sig- 
nificance and antecedent conditions is necessary, 
but not sufficient, for ensuring effective designs for 
shells. The emergence of an ERP-based polygraphy 
will depend on the ingenuity with which shells are 
designed. Useful shells will emerge from a careful 
and methodical analysis of the properties of the 
different ERP components, not from a brute force 
search for deception indicators. 

We conclude by noting that the results we report 
raise interesting questions with respect to the P300. 
Our success in detecting those subjects who were 
informed about the probes is somewhat puzzling. 
The probes were definitely not relevant to the task 
the subject was ostensibly performing. They were 
distinguished from the other irrelevant items solely 
by their association with the mock espionage scen- 


A P300-Based Guilty Knowledge Test 


545 


ario in which the subject participated. Why did such 
“irrelevant” items elicit a P300? We hypothesized, 
and demonstrated, that subjects are sensitive to 
such seemingly irrelevant items if these items are 
distinctive in some dimension that is important to 
the subject, even though it is not relevant to the 
task the subject is performing. 

The implication is that subjects monitor events 
along dimensions other than those specified by the 
experimenter, and that when distinctiveness is de- 
tected across such irrelevant dimensions, it may 
trigger the processing subsystem manifested by the 
P300. It did appear plausible that crime-related 
items would play such a role, and our results in- 
dicate that they did. However, we cannot predict, 
in the general case, which dimensions along which 
items are distinct will, or will not, be noted by the 
subject when the dimension in question is formally 
irrelevant. This is a rather important question, be- 
cause the concept of task relevance has played an 
important role in accounting for the P300 (Donchin 
et al., 1978; Johnson, 1986; Johnson & Donchin, 
1978; Résler, 1983). Frequently, task relevance is 
defined strictly in terms of the task assigned to the 
subject (e.g., Courchesne, Hillyard, & Galambos, 
1975). This is not, however, a fully adequate defi- 
nition, because subjects evidently extract infor- 
mation relevant to aspects of the situation that has 
little if anything to do with the assigned task. The 
present study did not examine in detail the range 
of distinctions that would play the role played by 
the probes in the present study. Such an analysis is 
clearly needed. 

The interest in a detailed elucidation of the cir- 
cumstances under which hidden probes will be ef- 
fective in eliciting a P300 is important from the 
psychophysiological perspective, because of the 
contribution it will make to a better understanding 
of the P300 component. The procedure we used 
here for detecting whether subjects have informa- 
tion regarding an espionage mission (or a minor 
crime) can be extended to determine whether sub- 
jects make other distinctions of which they are un- 
aware or for which they are not reliable witnesses. 


REFERENCES 


Barland, G.H., & Raskin, D.C. (1975). An evaluation of 
field techniques in detection of deception. Psycho- 
Physiology, 12, 321-330. 

Ben-Shakhar, G., & Furedy, J.J. (1990). Theories and ap- 
plications in the detection of deception: A psychophys- 
iological and international perspective. New York: 
Springer-Verlag. 

Courchesne, E., Hillyard, S.A., & Galambos, R. (1975). 
Stimulus novelty, task relevance and the visual evoked 
potential in man. Electroencephalography & Clinical 
Neurophysiology, 39, 131-143. 


Donchin, E. (1981). Surprise! ... Surprise? Psychophysi- 
ology, 18, 493-513. 

Donchin, E., & Coles, M.G.H. (1988). Is the P300 com- 
ponent a manifestation of context updating? The Be- 
havioral and Brain Sciences, 11, 355-425. 

Donchin, E., & Fabiani, M. (in press). The use of event- 
related brain potentials in the study of memory: Is 
P300 a measure of event distinctiveness? In J.R. Jen- 
nings & M.G.H. Coles (Eds.), Handbook of cognitive 
psychophysiology: Central and autonomic nervous sys- 
tem approaches. Chichester, UK: John Wiley. 


546 


Donchin, E., & Heffley, E. (1978). Multivariate analysis 
of event-related potential data: A tutorial review. In 
D. Otto (Ed.), Multidisciplinary perspectives in event- 
related brain potential research (pp. 215-217). EPA- 
6001-9-77-043, Washington, DC: U.S. Government 
Printing Office. 

Donchin, E., Heffley, E., Hillyard, S.A., Loveless, N., 
Maltzman, I., Ohman, A., Résler, F., Ruchkin, D., & 
Siddle, D. (1984). Cognition and event-related poten- 
tials. II. The orienting reflex and P300. Brain and in- 
formation: Event-related potentials. Annals of the New 
York Academy of Sciences, 425, 39-57. 

Donchin, E., Karis, D., Bashore, T.R., Coles, M.G.H., & 
Gratton, G. (1986). Cognitive psychophysiology and 
human information processing. In M.G.H. Coles, E. 
Donchin, & S.W. Porges (Eds.), Psychophysiology: Sys- 
tems, processes, and applications (pp. 244-267). New 
York: Guilford Press. 

Donchin, E., Ritter, W., & McCallum, W.C. (1978). Cog- 
nitive psychophysiology: The endogenous components 
of the ERP. In E. Callaway, P. Tueting, & S. Koslow 
(Eds.), Brain event-related potentials in man (pp. 349- 
441). New York: Academic Press. 

Duncan-Johnson, C.C., & Donchin, E. (1977). On quan- 
tifying surprise: The variation of event-related poten- 
tials with subjective probability. Psychophysiology, 14, 
456-467. 

Efron, B. (1979). Bootstrap methods; Another look at the 
jackknife. Annals of Statistics, 7, 1-26. 

Ekman, P. (1985). Telling lies: Clues to deceit in the mar- 
ketplace, politics, and marriage. New York: Norton. 

Fabiani, M., Gratton, G., Karis, D., & Donchin, E. (1987). 
The definition, identification, and reliability of mea- 
surement of the P300 component of the event-related 
brain potential. In P.K. Ackles, J.R. Jennings, & 
M.G.H. Coles (Eds.), Advances in psychophysiology 
(Vol. 2, pp. 1-78). Greenwich, CT: JAI Press, Inc. 

Farwell, L.A., & Donchin, E. (1986). The “brain detector:” 
P300 in the detection of deception [Abstract]. Psycho- 
physiology, 24, 434. 

Farwell, L.A., & Donchin, E. (1988). The truth will out: 
Interrogative polygraphy with event-related brain po- 
tentials [Abstract]. Psychophysiology, 25, 445. 

Farwell, L.A., & Donchin, E. (1989). Detection of guilty 
knowledge with event related potentials [Abstract]. 
Psychophysiology, 26(Suppl.), $8. 

Farwell, L.A., Martinerie, J.M., Bashore, T.B., & Rapp, 
PE. (1991). Optimal digital filters for long latency 
event-related brain potentials. Manuscript in prepara- 
tion. 

Furedy, J.J. (1986). Lie detection as psychophysiological 
differentiation: Some fine lines. In M.G.H. Coles, E. 
Donchin, & S.W. Porges (Eds.), Psychophysiology: Sys- 
tems, processes, and applications (pp. 683-701). New 
York: Guilford Press. 

Hillyard, S.A. (1984). Event-related potentials and selec- 
tive attention. In E. Donchin (Ed.), Cognitive psycho- 
physiology: Event-related potentials and the study of 
cognition. Vol. I: The Carmel Conferences (pp. 51-72). 
Hillsdale, NJ: Erlbaum. 

Hillyard, S.A., & Kutas, M. (1983). Electrophysiology of 
cognitive processing. In M.R. Rosenzweig & L.W. Por- 


Farwell and Donchin 


Vol. 28, No. 5 


ter (Eds.), Annual review of psychology (Vol. 34, pp. 
33-61). Palo Alto, CA: Annual Reviews, Inc. 

Johnson, R., Jr. (1986). A triarchic model of P300 am- 
plitude. Psychophysiology, 23, 367-384. 

Johnson, R., Jr., & Donchin, E. (1978). On how P300 
amplitude varies with the utility of the eliciting stim- 
uli. Electroencephalography & Clinical Neurophysiol- 
ogy, 44, 424-437. 

Kutas, M., & Hillyard, S.A. (1980). Reading senseless sen- 
tences: Brain potentials reflect semantic incongruity. 
Science, 207, 203-205. i 

Kutas, M., & Van Petten, C. (1988). The N400 and lan- 
guage. In P.K. Ackles, J.R. Jennings, & M.G.H. Coles 
(Eds.), Advances in psychophysiology (Vol. 3, pp. 139- 
187). Greenwich, CT: JAI Press, Inc. 

Lykken, D.T. (1959). The GSR in the detection of guilt. 
Journal of Applied Psychology, 43, 385-388. 

Lykken, D.T. (1960). The validity of the guilty knowledge 
technique: The effects of faking. Journal of Applied 
Psychology, 44, 258-262. 

Lykken, D.T. (1974). Psychology and the lie detector in- 
dustry. American Psychologist, 29, 725-739. 

Lykken, D.T. (1981). A tremor in the blood: Uses and 
abuses of the lie detector. New York: McGraw Hill. 
Naatdnen, R., & Picton, T. (1987). The N1 wave of the 
human electric and magnetic response to sound: A 
review and an analysis of the component structure. 

Psychophysiology, 24, 375-425. 

Office of Technology Assessment (1983). Scientific validity 
of polygraph testing: A research review and evaluation— 
A technical memorandum. Washington, DC: U.S. 
Government Printing Office. 

Podlesny, J.A., & Raskin, D.C. (1978). Effectiveness of 
techniques and physiological measures in the detection 
of deception. Psychophysiology, 15, 344-358. 

Raskin, D. C. (1986). The polygraph in 1986: Scientific, 
professional, and legal issues surrounding applications 
and acceptance of polygraph evidence. The Utah Law 
Review, 1986, 29-74. 

Reid, J.E., & Inbau, F.E. (1977). Truth and deception— 
The polygraph technique (3rd ed.). Baltimore: The Wil- 
liams and Wilkins Co. 

Rohrbaugh, J.W., & Gaillard, A.W.K. (1983). Sensory and 
motor aspects of the contingent negative variation. In 
A.W.K. Gaillard & W. Ritter (Eds.), Tutorials in event- 
related potential research: Endogenous components 
(pp. 269-310). Amsterdam: North-Holland. 

Rosenfeld, J.P., Angell, A., Johnson, M., & Qian, J. (1991). 
An ERP-based, control-question lie detector analog: 
Algorithms for discriminating effects within individ- 
uals’ average waveforms. Psychophysiology, 28, 319- 
335. 

Rosenfeld, J.P., Nasman, V.T., Whalen, R., Cantwell, B., 
& Mazzeri, L. (1987). Late vertex positivity in event- 
related potentials as a guilty knowledge indicator: A 
new method of lie detection. /nternational Journal of 
Neuroscience, 34, 125-129. 

Rosler, F. (1983). Endogenous ERPs and cognition: 
Probes, prospects, and pitfalls in matching pieces of 
the mind-body problem. In A.W.K. Gaillard & W. Rit- 
ter (Eds.), Tutorials in event-related potential research: 


September, 1991 A P300-Based Guilty Knowledge Test 547 


Endogenous components (pp. 9-35). Amsterdam: El- signal detection task with cued intervals. Journal of 

sevier. Experimental Psychology: Human Perception and Per- 
Siddle, D.A.T., & Packer, J.S. (in press). Memory and formance, 1, 268-279. . 

autonomic activity: The role of the orienting response. Walter, W.G., Cooper, R., Aldridge, V.J., McCallum, 


; W.C., & Winter, A.L. (1964). Contingent negative var- 
In J.R. Jennings & M.G.H. Coles (Eds.), Handbook of iation: An electrical sign of sensorimotor association 


cognitive psychophysiology: Central and autonomic and expectancy in the human brain. Nature, 203,'380- 











nervous system approaches. Chichester, UK: John Wil- 384. 
cy: . Wasserman, S., & Bockenholt, U. (1989), Bootstrapping: 
Squires, K.C., Squires, N.K., & Hillyard, S.A. (1975). De- Applications to psychophysiology. Psychophysiology, 
cision-related cortical potentials during an auditory 26, 208-221. 
Appendix A 
The stimuli used in the present study 
Scenario 1 Scenario 2 
Probe Target Irrelevant Probe Target Irrelevant 
Blue Coat Green Hat Brown Shoes White Shirt Green Tie Beige Suit 
Red Scarf Red Vest 
Gray Pants Tan Belt 
Black Gloves Black Socks 
Phil Jenks Tim Howe Ray Snell Dale Spence Wayne Brant Glenn Platt 
Neil Rand Walt Rusk 
Gene Falk Tod Ames 
Ralph Croft Earl Dade 
Op Cow Op Pig Op Horse Op Spruce Op Fir Op Oak 
Op Goat Op Birch 
Op Sheep Op Elm 
Op Mule Op Pine 
Rain File Snow File Hail File Owl File Swan File Wren File 
Wind File Duck File 
Sleet File Crow File 
Fog File Goose File 
Sub Plans Ship Plans Tank Plans Brass Plans Steel Plans Tin Plans 
Plane Plans Zinc Plans 
Bomb Plans Lead Plans 
Gun Plans Iron Plans 
Perch Street Shark Street Cod Street Lion Street Fox Street Deer Street 
Carp Street Wolf Street 
Pike Street Bear Street 
Trout Street Elk Street 





(Manuscript received May 24, 1990; accepted for publication October 20, 1990) 


This document is a scanned copy of a printed document. No warranty is given about 
the accuracy of the copy. Users should refer to the original published version of the 
material. 


